练习:《An Introduction to Statistical Learning, with Applications in R》Section 2.4 Excercises: 1, 3, 8, 10
1,
我们按照上图回归模型中提及的test测试集上的MSE的bias-var分解为依据,
然后我们一般认为是更灵活的方法,它的方差会比较大,但是偏差会比较小,即In general, more fexible statistical methods have higher variance.
总之不可约误差,也就是误差的方差我们一般是不可控的,所以我们只考虑关于偏差以及方差的影响因素;
或者我们这么来想,因为更灵活的方法偏差可以控制得比较小,然后不可约误差我们一般没法改变,所以我们可以直接考虑方差的影响,如果方差可以控制得小,我们可以选择更灵活的方法,反之需要考虑不灵活的方法
(1) 样本尺寸很大,但是自变量也就是feature(输入变量)数目很少:
因为灵活模型的偏差比较小,然后p也很少,也就是需要拟合的参数比较少(需要估计的参数比较少),然后样本尺寸更大的话,此处参考下面的模型预测值的方差公式
我们可以认为更加灵活的模型的,随着样本量的增加,其方差的影响是能够被控制住的,所以我们还是选择更加灵活的模型,也就是更灵活的模型会有更好的表现
(2) 观测的样本数目很少,但是自变量也就是输入变量的数目很多:
这道题就是和上面的题目反着来了,更加灵活的模型可以考虑到偏差,但是方差无法控制,而且样本数目太少了,很容易过拟合,导致高方差;如果是不灵活的方法,可以控制住方差,还是选择不灵活的方法; 也就是更灵活的模型可能会有更差的表现
(3) 因变量和自变量之间的关系是高度非线性的:
高度非线性,就需要复杂的拟合,选择更加灵活的模型,可以减少偏差,至少不灵活的模型很难做到;更灵活的模型会有更好的表现
(4) 不可约误差项很高:
这个是不可控的,所以无法判断偏差或方差的影响,模型选择无法判断;但总体来说,更灵活的模型偏差会比较小,但是方差会比较大,当然偏差都可以减小,所以更灵活的模型可能表现会比较差
2,
因为是提供草图,所以只要形似即可,当然也可以提供模拟数据进行演示,
然后我们需要提供的指标是偏差bias,方差,训练集以及测试集的误差,以及贝叶斯误差曲线(应该是分类问题中,具有理论最低错分率,当然就是不可约误差最低的那一项)
首先训练误差以及测试误差我们可以形式仿照ppt上的:
然后解释就是随着模型的灵活性增加,也就是模型越来越复杂,我们可以很轻易的在训练集上减小偏差,但是随之而来的,会导致数据的过拟合,所以一般测试集的error,此处我们使用MSE一般是U形的,也就是先降低再升高;然后训练集上就是单调的减小
再然后就是bias-var分解的曲线我们也可以借鉴:
总之我们的草图曲线绘制如下:
解释如下:
首先是train error,随着模型复杂度,灵活性的增加,我们可以轻易减少偏差,达到在训练集上error的减少,主要是训练集上模型能够完美拟合训练数据,所以训练误差会继续下降,甚至接近于0;
然后test error,同样的,一开始是偏差减少,方差增加,但是偏差减少程度更大,所以会使曲线error下降,但紧接着随着模型越来越灵活,我们的数据在方差上的增加程度会超过偏差的减少程度,包括过拟合现象等,可以看做是bias-var的一个平衡tradeoff;
至于bias,也就是偏差,我们前面就说了,随着模型的复杂度的增加,更加灵活,偏差是可以持续下降的,实际上就是模型预测值与真实值之间的差;
然后方差的话,因为更加灵活的模型,越容易导致过拟合,所以方差会持续增大;
至于不可约误差,因为是理论最低test的error,会是一条水平线,表示任何模型子啊给定数据固有噪声的情况下可以达到的最低误差率,不会随着方法灵活性的变化而变化
8,
# a,b
# 注意,数据集需要到https://www.statlearning.com/resources-first-edition中下载
library(tidyverse)
# 读取数据
college <- read_csv("/data1/project/College.csv",col_names = TRUE)
college
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ── ✔ dplyr 1.1.4 ✔ readr 2.1.5 ✔ forcats 1.0.0 ✔ stringr 1.5.1 ✔ ggplot2 3.5.1 ✔ tibble 3.2.1 ✔ lubridate 1.9.4 ✔ tidyr 1.3.1 ✔ purrr 1.0.2 ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ── ✖ dplyr::filter() masks stats::filter() ✖ dplyr::lag() masks stats::lag() ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors New names: • `` -> `...1` Rows: 777 Columns: 19 ── Column specification ──────────────────────────────────────────────────────── Delimiter: "," chr (2): ...1, Private dbl (17): Apps, Accept, Enroll, Top10perc, Top25perc, F.Undergrad, P.Undergr... ℹ Use `spec()` to retrieve the full column specification for this data. ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
| ...1 | Private | Apps | Accept | Enroll | Top10perc | Top25perc | F.Undergrad | P.Undergrad | Outstate | Room.Board | Books | Personal | PhD | Terminal | S.F.Ratio | perc.alumni | Expend | Grad.Rate |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| <chr> | <chr> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> |
| Abilene Christian University | Yes | 1660 | 1232 | 721 | 23 | 52 | 2885 | 537 | 7440 | 3300 | 450 | 2200 | 70 | 78 | 18.1 | 12 | 7041 | 60 |
| Adelphi University | Yes | 2186 | 1924 | 512 | 16 | 29 | 2683 | 1227 | 12280 | 6450 | 750 | 1500 | 29 | 30 | 12.2 | 16 | 10527 | 56 |
| Adrian College | Yes | 1428 | 1097 | 336 | 22 | 50 | 1036 | 99 | 11250 | 3750 | 400 | 1165 | 53 | 66 | 12.9 | 30 | 8735 | 54 |
| Agnes Scott College | Yes | 417 | 349 | 137 | 60 | 89 | 510 | 63 | 12960 | 5450 | 450 | 875 | 92 | 97 | 7.7 | 37 | 19016 | 59 |
| Alaska Pacific University | Yes | 193 | 146 | 55 | 16 | 44 | 249 | 869 | 7560 | 4120 | 800 | 1500 | 76 | 72 | 11.9 | 2 | 10922 | 15 |
| Albertson College | Yes | 587 | 479 | 158 | 38 | 62 | 678 | 41 | 13500 | 3335 | 500 | 675 | 67 | 73 | 9.4 | 11 | 9727 | 55 |
| Albertus Magnus College | Yes | 353 | 340 | 103 | 17 | 45 | 416 | 230 | 13290 | 5720 | 500 | 1500 | 90 | 93 | 11.5 | 26 | 8861 | 63 |
| Albion College | Yes | 1899 | 1720 | 489 | 37 | 68 | 1594 | 32 | 13868 | 4826 | 450 | 850 | 89 | 100 | 13.7 | 37 | 11487 | 73 |
| Albright College | Yes | 1038 | 839 | 227 | 30 | 63 | 973 | 306 | 15595 | 4400 | 300 | 500 | 79 | 84 | 11.3 | 23 | 11644 | 80 |
| Alderson-Broaddus College | Yes | 582 | 498 | 172 | 21 | 44 | 799 | 78 | 10468 | 3380 | 660 | 1800 | 40 | 41 | 11.5 | 15 | 8991 | 52 |
| Alfred University | Yes | 1732 | 1425 | 472 | 37 | 75 | 1830 | 110 | 16548 | 5406 | 500 | 600 | 82 | 88 | 11.3 | 31 | 10932 | 73 |
| Allegheny College | Yes | 2652 | 1900 | 484 | 44 | 77 | 1707 | 44 | 17080 | 4440 | 400 | 600 | 73 | 91 | 9.9 | 41 | 11711 | 76 |
| Allentown Coll. of St. Francis de Sales | Yes | 1179 | 780 | 290 | 38 | 64 | 1130 | 638 | 9690 | 4785 | 600 | 1000 | 60 | 84 | 13.3 | 21 | 7940 | 74 |
| Alma College | Yes | 1267 | 1080 | 385 | 44 | 73 | 1306 | 28 | 12572 | 4552 | 400 | 400 | 79 | 87 | 15.3 | 32 | 9305 | 68 |
| Alverno College | Yes | 494 | 313 | 157 | 23 | 46 | 1317 | 1235 | 8352 | 3640 | 650 | 2449 | 36 | 69 | 11.1 | 26 | 8127 | 55 |
| American International College | Yes | 1420 | 1093 | 220 | 9 | 22 | 1018 | 287 | 8700 | 4780 | 450 | 1400 | 78 | 84 | 14.7 | 19 | 7355 | 69 |
| Amherst College | Yes | 4302 | 992 | 418 | 83 | 96 | 1593 | 5 | 19760 | 5300 | 660 | 1598 | 93 | 98 | 8.4 | 63 | 21424 | 100 |
| Anderson University | Yes | 1216 | 908 | 423 | 19 | 40 | 1819 | 281 | 10100 | 3520 | 550 | 1100 | 48 | 61 | 12.1 | 14 | 7994 | 59 |
| Andrews University | Yes | 1130 | 704 | 322 | 14 | 23 | 1586 | 326 | 9996 | 3090 | 900 | 1320 | 62 | 66 | 11.5 | 18 | 10908 | 46 |
| Angelo State University | No | 3540 | 2001 | 1016 | 24 | 54 | 4190 | 1512 | 5130 | 3592 | 500 | 2000 | 60 | 62 | 23.1 | 5 | 4010 | 34 |
| Antioch University | Yes | 713 | 661 | 252 | 25 | 44 | 712 | 23 | 15476 | 3336 | 400 | 1100 | 69 | 82 | 11.3 | 35 | 42926 | 48 |
| Appalachian State University | No | 7313 | 4664 | 1910 | 20 | 63 | 9940 | 1035 | 6806 | 2540 | 96 | 2000 | 83 | 96 | 18.3 | 14 | 5854 | 70 |
| Aquinas College | Yes | 619 | 516 | 219 | 20 | 51 | 1251 | 767 | 11208 | 4124 | 350 | 1615 | 55 | 65 | 12.7 | 25 | 6584 | 65 |
| Arizona State University Main campus | No | 12809 | 10308 | 3761 | 24 | 49 | 22593 | 7585 | 7434 | 4850 | 700 | 2100 | 88 | 93 | 18.9 | 5 | 4602 | 48 |
| Arkansas College (Lyon College) | Yes | 708 | 334 | 166 | 46 | 74 | 530 | 182 | 8644 | 3922 | 500 | 800 | 79 | 88 | 12.6 | 24 | 14579 | 54 |
| Arkansas Tech University | No | 1734 | 1729 | 951 | 12 | 52 | 3602 | 939 | 3460 | 2650 | 450 | 1000 | 57 | 60 | 19.6 | 5 | 4739 | 48 |
| Assumption College | Yes | 2135 | 1700 | 491 | 23 | 59 | 1708 | 689 | 12000 | 5920 | 500 | 500 | 93 | 93 | 13.8 | 30 | 7100 | 88 |
| Auburn University-Main Campus | No | 7548 | 6791 | 3070 | 25 | 57 | 16262 | 1716 | 6300 | 3933 | 600 | 1908 | 85 | 91 | 16.7 | 18 | 6642 | 69 |
| Augsburg College | Yes | 662 | 513 | 257 | 12 | 30 | 2074 | 726 | 11902 | 4372 | 540 | 950 | 65 | 65 | 12.8 | 31 | 7836 | 58 |
| Augustana College IL | Yes | 1879 | 1658 | 497 | 36 | 69 | 1950 | 38 | 13353 | 4173 | 540 | 821 | 78 | 83 | 12.7 | 40 | 9220 | 71 |
| ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
| Westfield State College | No | 3100 | 2150 | 825 | 3 | 20 | 3234 | 941 | 5542 | 3788 | 500 | 1300 | 75 | 79 | 15.7 | 20 | 4222 | 65 |
| Westminster College MO | Yes | 662 | 553 | 184 | 20 | 43 | 665 | 37 | 10720 | 4050 | 600 | 1650 | 66 | 70 | 12.5 | 20 | 7925 | 62 |
| Westminster College | Yes | 996 | 866 | 377 | 29 | 58 | 1411 | 72 | 12065 | 3615 | 430 | 685 | 62 | 78 | 12.5 | 41 | 8596 | 80 |
| Westminster College of Salt Lake City | Yes | 917 | 720 | 213 | 21 | 60 | 979 | 743 | 8820 | 4050 | 600 | 2025 | 68 | 83 | 10.5 | 34 | 7170 | 50 |
| Westmont College | No | 950 | 713 | 351 | 42 | 72 | 1276 | 9 | 14320 | 5304 | 490 | 1410 | 77 | 77 | 14.9 | 17 | 8837 | 87 |
| Wheaton College IL | Yes | 1432 | 920 | 548 | 56 | 84 | 2200 | 56 | 11480 | 4200 | 530 | 1400 | 81 | 83 | 12.7 | 40 | 11916 | 85 |
| Westminster College PA | Yes | 1738 | 1373 | 417 | 21 | 55 | 1335 | 30 | 18460 | 5970 | 700 | 850 | 92 | 96 | 13.2 | 41 | 22704 | 71 |
| Wheeling Jesuit College | Yes | 903 | 755 | 213 | 15 | 49 | 971 | 305 | 10500 | 4545 | 600 | 600 | 66 | 71 | 14.1 | 27 | 7494 | 72 |
| Whitman College | Yes | 1861 | 998 | 359 | 45 | 77 | 1220 | 46 | 16670 | 4900 | 750 | 800 | 80 | 83 | 10.5 | 51 | 13198 | 72 |
| Whittier College | Yes | 1681 | 1069 | 344 | 35 | 63 | 1235 | 30 | 16249 | 5699 | 500 | 1998 | 84 | 92 | 13.6 | 29 | 11778 | 52 |
| Whitworth College | Yes | 1121 | 926 | 372 | 43 | 70 | 1270 | 160 | 12660 | 4500 | 678 | 2424 | 80 | 80 | 16.9 | 20 | 8328 | 80 |
| Widener University | Yes | 2139 | 1492 | 502 | 24 | 64 | 2186 | 2171 | 12350 | 5370 | 500 | 1350 | 88 | 86 | 12.6 | 19 | 9603 | 63 |
| Wilkes University | Yes | 1631 | 1431 | 434 | 15 | 36 | 1803 | 603 | 11150 | 5130 | 550 | 1260 | 78 | 92 | 13.3 | 24 | 8543 | 67 |
| Willamette University | Yes | 1658 | 1327 | 395 | 49 | 80 | 1595 | 159 | 14800 | 4620 | 400 | 790 | 91 | 94 | 13.3 | 37 | 10779 | 68 |
| William Jewell College | Yes | 663 | 547 | 315 | 32 | 67 | 1279 | 75 | 10060 | 2970 | 500 | 2600 | 74 | 80 | 11.2 | 19 | 7885 | 59 |
| William Woods University | Yes | 469 | 435 | 227 | 17 | 39 | 851 | 120 | 10535 | 4365 | 550 | 3700 | 39 | 66 | 12.9 | 16 | 7438 | 52 |
| Williams College | Yes | 4186 | 1245 | 526 | 81 | 96 | 1988 | 29 | 19629 | 5790 | 500 | 1200 | 94 | 99 | 9.0 | 64 | 22014 | 99 |
| Wilson College | Yes | 167 | 130 | 46 | 16 | 50 | 199 | 676 | 11428 | 5084 | 450 | 475 | 67 | 76 | 8.3 | 43 | 10291 | 67 |
| Wingate College | Yes | 1239 | 1017 | 383 | 10 | 34 | 1207 | 157 | 7820 | 3400 | 550 | 1550 | 69 | 81 | 13.9 | 8 | 7264 | 91 |
| Winona State University | No | 3325 | 2047 | 1301 | 20 | 45 | 5800 | 872 | 4200 | 2700 | 300 | 1200 | 53 | 60 | 20.2 | 18 | 5318 | 58 |
| Winthrop University | No | 2320 | 1805 | 769 | 24 | 61 | 3395 | 670 | 6400 | 3392 | 580 | 2150 | 71 | 80 | 12.8 | 26 | 6729 | 59 |
| Wisconsin Lutheran College | Yes | 152 | 128 | 75 | 17 | 41 | 282 | 22 | 9100 | 3700 | 500 | 1400 | 48 | 48 | 8.5 | 26 | 8960 | 50 |
| Wittenberg University | Yes | 1979 | 1739 | 575 | 42 | 68 | 1980 | 144 | 15948 | 4404 | 400 | 800 | 82 | 95 | 12.8 | 29 | 10414 | 78 |
| Wofford College | Yes | 1501 | 935 | 273 | 51 | 83 | 1059 | 34 | 12680 | 4150 | 605 | 1440 | 91 | 92 | 15.3 | 42 | 7875 | 75 |
| Worcester Polytechnic Institute | Yes | 2768 | 2314 | 682 | 49 | 86 | 2802 | 86 | 15884 | 5370 | 530 | 730 | 92 | 94 | 15.2 | 34 | 10774 | 82 |
| Worcester State College | No | 2197 | 1515 | 543 | 4 | 26 | 3089 | 2029 | 6797 | 3900 | 500 | 1200 | 60 | 60 | 21.0 | 14 | 4469 | 40 |
| Xavier University | Yes | 1959 | 1805 | 695 | 24 | 47 | 2849 | 1107 | 11520 | 4960 | 600 | 1250 | 73 | 75 | 13.3 | 31 | 9189 | 83 |
| Xavier University of Louisiana | Yes | 2097 | 1915 | 695 | 34 | 61 | 2793 | 166 | 6900 | 4200 | 617 | 781 | 67 | 75 | 14.4 | 20 | 8323 | 49 |
| Yale University | Yes | 10705 | 2453 | 1317 | 95 | 99 | 5217 | 83 | 19840 | 6510 | 630 | 2115 | 96 | 96 | 5.8 | 49 | 40386 | 99 |
| York College of Pennsylvania | Yes | 2989 | 1855 | 691 | 28 | 63 | 2988 | 1726 | 4990 | 3560 | 500 | 1250 | 75 | 75 | 18.1 | 28 | 4509 | 99 |
# 我们首先来查看一些这些数据的基本信息
# 维度
dim(college) # 777x19,也就是777行以及19列
rownames(x = college) # 行索引index,我们需要查看第1列
colnames(x = college) # 列名names,发现是'...1'
college[,'...1'] #这些学校
- 777
- 19
- '1'
- '2'
- '3'
- '4'
- '5'
- '6'
- '7'
- '8'
- '9'
- '10'
- '11'
- '12'
- '13'
- '14'
- '15'
- '16'
- '17'
- '18'
- '19'
- '20'
- '21'
- '22'
- '23'
- '24'
- '25'
- '26'
- '27'
- '28'
- '29'
- '30'
- '31'
- '32'
- '33'
- '34'
- '35'
- '36'
- '37'
- '38'
- '39'
- '40'
- '41'
- '42'
- '43'
- '44'
- '45'
- '46'
- '47'
- '48'
- '49'
- '50'
- '51'
- '52'
- '53'
- '54'
- '55'
- '56'
- '57'
- '58'
- '59'
- '60'
- '61'
- '62'
- '63'
- '64'
- '65'
- '66'
- '67'
- '68'
- '69'
- '70'
- '71'
- '72'
- '73'
- '74'
- '75'
- '76'
- '77'
- '78'
- '79'
- '80'
- '81'
- '82'
- '83'
- '84'
- '85'
- '86'
- '87'
- '88'
- '89'
- '90'
- '91'
- '92'
- '93'
- '94'
- '95'
- '96'
- '97'
- '98'
- '99'
- '100'
- '101'
- '102'
- '103'
- '104'
- '105'
- '106'
- '107'
- '108'
- '109'
- '110'
- '111'
- '112'
- '113'
- '114'
- '115'
- '116'
- '117'
- '118'
- '119'
- '120'
- '121'
- '122'
- '123'
- '124'
- '125'
- '126'
- '127'
- '128'
- '129'
- '130'
- '131'
- '132'
- '133'
- '134'
- '135'
- '136'
- '137'
- '138'
- '139'
- '140'
- '141'
- '142'
- '143'
- '144'
- '145'
- '146'
- '147'
- '148'
- '149'
- '150'
- '151'
- '152'
- '153'
- '154'
- '155'
- '156'
- '157'
- '158'
- '159'
- '160'
- '161'
- '162'
- '163'
- '164'
- '165'
- '166'
- '167'
- '168'
- '169'
- '170'
- '171'
- '172'
- '173'
- '174'
- '175'
- '176'
- '177'
- '178'
- '179'
- '180'
- '181'
- '182'
- '183'
- '184'
- '185'
- '186'
- '187'
- '188'
- '189'
- '190'
- '191'
- '192'
- '193'
- '194'
- '195'
- '196'
- '197'
- '198'
- '199'
- '200'
- ⋯
- '578'
- '579'
- '580'
- '581'
- '582'
- '583'
- '584'
- '585'
- '586'
- '587'
- '588'
- '589'
- '590'
- '591'
- '592'
- '593'
- '594'
- '595'
- '596'
- '597'
- '598'
- '599'
- '600'
- '601'
- '602'
- '603'
- '604'
- '605'
- '606'
- '607'
- '608'
- '609'
- '610'
- '611'
- '612'
- '613'
- '614'
- '615'
- '616'
- '617'
- '618'
- '619'
- '620'
- '621'
- '622'
- '623'
- '624'
- '625'
- '626'
- '627'
- '628'
- '629'
- '630'
- '631'
- '632'
- '633'
- '634'
- '635'
- '636'
- '637'
- '638'
- '639'
- '640'
- '641'
- '642'
- '643'
- '644'
- '645'
- '646'
- '647'
- '648'
- '649'
- '650'
- '651'
- '652'
- '653'
- '654'
- '655'
- '656'
- '657'
- '658'
- '659'
- '660'
- '661'
- '662'
- '663'
- '664'
- '665'
- '666'
- '667'
- '668'
- '669'
- '670'
- '671'
- '672'
- '673'
- '674'
- '675'
- '676'
- '677'
- '678'
- '679'
- '680'
- '681'
- '682'
- '683'
- '684'
- '685'
- '686'
- '687'
- '688'
- '689'
- '690'
- '691'
- '692'
- '693'
- '694'
- '695'
- '696'
- '697'
- '698'
- '699'
- '700'
- '701'
- '702'
- '703'
- '704'
- '705'
- '706'
- '707'
- '708'
- '709'
- '710'
- '711'
- '712'
- '713'
- '714'
- '715'
- '716'
- '717'
- '718'
- '719'
- '720'
- '721'
- '722'
- '723'
- '724'
- '725'
- '726'
- '727'
- '728'
- '729'
- '730'
- '731'
- '732'
- '733'
- '734'
- '735'
- '736'
- '737'
- '738'
- '739'
- '740'
- '741'
- '742'
- '743'
- '744'
- '745'
- '746'
- '747'
- '748'
- '749'
- '750'
- '751'
- '752'
- '753'
- '754'
- '755'
- '756'
- '757'
- '758'
- '759'
- '760'
- '761'
- '762'
- '763'
- '764'
- '765'
- '766'
- '767'
- '768'
- '769'
- '770'
- '771'
- '772'
- '773'
- '774'
- '775'
- '776'
- '777'
- '...1'
- 'Private'
- 'Apps'
- 'Accept'
- 'Enroll'
- 'Top10perc'
- 'Top25perc'
- 'F.Undergrad'
- 'P.Undergrad'
- 'Outstate'
- 'Room.Board'
- 'Books'
- 'Personal'
- 'PhD'
- 'Terminal'
- 'S.F.Ratio'
- 'perc.alumni'
- 'Expend'
- 'Grad.Rate'
| ...1 |
|---|
| <chr> |
| Abilene Christian University |
| Adelphi University |
| Adrian College |
| Agnes Scott College |
| Alaska Pacific University |
| Albertson College |
| Albertus Magnus College |
| Albion College |
| Albright College |
| Alderson-Broaddus College |
| Alfred University |
| Allegheny College |
| Allentown Coll. of St. Francis de Sales |
| Alma College |
| Alverno College |
| American International College |
| Amherst College |
| Anderson University |
| Andrews University |
| Angelo State University |
| Antioch University |
| Appalachian State University |
| Aquinas College |
| Arizona State University Main campus |
| Arkansas College (Lyon College) |
| Arkansas Tech University |
| Assumption College |
| Auburn University-Main Campus |
| Augsburg College |
| Augustana College IL |
| ⋮ |
| Westfield State College |
| Westminster College MO |
| Westminster College |
| Westminster College of Salt Lake City |
| Westmont College |
| Wheaton College IL |
| Westminster College PA |
| Wheeling Jesuit College |
| Whitman College |
| Whittier College |
| Whitworth College |
| Widener University |
| Wilkes University |
| Willamette University |
| William Jewell College |
| William Woods University |
| Williams College |
| Wilson College |
| Wingate College |
| Winona State University |
| Winthrop University |
| Wisconsin Lutheran College |
| Wittenberg University |
| Wofford College |
| Worcester Polytechnic Institute |
| Worcester State College |
| Xavier University |
| Xavier University of Louisiana |
| Yale University |
| York College of Pennsylvania |
# 上面的问题就是我们需要将学校名设置为行名,其实就是obs观测
college <- read.csv("/data1/project/College.csv")
# rownnames(college) <- college[,'...1'] #这是按照逻辑索引
rownames(college) <- college[,1] #这是按照数字下标索引,我们采用这种
# head(college)
# view(college) # 需要在rstudio中查看,我们可以在vscode中也配置相关插件
# 我们暂时只使用head,或者是rownames来查看效果
head(college)
# rownames(college)
college <- college[, -1] # 去除第1列
head(college)
| X | Private | Apps | Accept | Enroll | Top10perc | Top25perc | F.Undergrad | P.Undergrad | Outstate | Room.Board | Books | Personal | PhD | Terminal | S.F.Ratio | perc.alumni | Expend | Grad.Rate | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| <chr> | <chr> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | <dbl> | <int> | <int> | <int> | |
| Abilene Christian University | Abilene Christian University | Yes | 1660 | 1232 | 721 | 23 | 52 | 2885 | 537 | 7440 | 3300 | 450 | 2200 | 70 | 78 | 18.1 | 12 | 7041 | 60 |
| Adelphi University | Adelphi University | Yes | 2186 | 1924 | 512 | 16 | 29 | 2683 | 1227 | 12280 | 6450 | 750 | 1500 | 29 | 30 | 12.2 | 16 | 10527 | 56 |
| Adrian College | Adrian College | Yes | 1428 | 1097 | 336 | 22 | 50 | 1036 | 99 | 11250 | 3750 | 400 | 1165 | 53 | 66 | 12.9 | 30 | 8735 | 54 |
| Agnes Scott College | Agnes Scott College | Yes | 417 | 349 | 137 | 60 | 89 | 510 | 63 | 12960 | 5450 | 450 | 875 | 92 | 97 | 7.7 | 37 | 19016 | 59 |
| Alaska Pacific University | Alaska Pacific University | Yes | 193 | 146 | 55 | 16 | 44 | 249 | 869 | 7560 | 4120 | 800 | 1500 | 76 | 72 | 11.9 | 2 | 10922 | 15 |
| Albertson College | Albertson College | Yes | 587 | 479 | 158 | 38 | 62 | 678 | 41 | 13500 | 3335 | 500 | 675 | 67 | 73 | 9.4 | 11 | 9727 | 55 |
| Private | Apps | Accept | Enroll | Top10perc | Top25perc | F.Undergrad | P.Undergrad | Outstate | Room.Board | Books | Personal | PhD | Terminal | S.F.Ratio | perc.alumni | Expend | Grad.Rate | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| <chr> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | <dbl> | <int> | <int> | <int> | |
| Abilene Christian University | Yes | 1660 | 1232 | 721 | 23 | 52 | 2885 | 537 | 7440 | 3300 | 450 | 2200 | 70 | 78 | 18.1 | 12 | 7041 | 60 |
| Adelphi University | Yes | 2186 | 1924 | 512 | 16 | 29 | 2683 | 1227 | 12280 | 6450 | 750 | 1500 | 29 | 30 | 12.2 | 16 | 10527 | 56 |
| Adrian College | Yes | 1428 | 1097 | 336 | 22 | 50 | 1036 | 99 | 11250 | 3750 | 400 | 1165 | 53 | 66 | 12.9 | 30 | 8735 | 54 |
| Agnes Scott College | Yes | 417 | 349 | 137 | 60 | 89 | 510 | 63 | 12960 | 5450 | 450 | 875 | 92 | 97 | 7.7 | 37 | 19016 | 59 |
| Alaska Pacific University | Yes | 193 | 146 | 55 | 16 | 44 | 249 | 869 | 7560 | 4120 | 800 | 1500 | 76 | 72 | 11.9 | 2 | 10922 | 15 |
| Albertson College | Yes | 587 | 479 | 158 | 38 | 62 | 678 | 41 | 13500 | 3335 | 500 | 675 | 67 | 73 | 9.4 | 11 | 9727 | 55 |
# 上面的方法对于df是适用的,但是我们用的是tidyverse,所以有tibble专门的列名转行名的方法
college <- read_csv("/data1/project/College.csv",col_names = TRUE)
college <- college %>% column_to_rownames(var = "...1")
# 这种方法就不用手动移除第1列了
head(college)
# rownames(college)
New names: • `` -> `...1` Rows: 777 Columns: 19 ── Column specification ──────────────────────────────────────────────────────── Delimiter: "," chr (2): ...1, Private dbl (17): Apps, Accept, Enroll, Top10perc, Top25perc, F.Undergrad, P.Undergr... ℹ Use `spec()` to retrieve the full column specification for this data. ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
| Private | Apps | Accept | Enroll | Top10perc | Top25perc | F.Undergrad | P.Undergrad | Outstate | Room.Board | Books | Personal | PhD | Terminal | S.F.Ratio | perc.alumni | Expend | Grad.Rate | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| <chr> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | |
| Abilene Christian University | Yes | 1660 | 1232 | 721 | 23 | 52 | 2885 | 537 | 7440 | 3300 | 450 | 2200 | 70 | 78 | 18.1 | 12 | 7041 | 60 |
| Adelphi University | Yes | 2186 | 1924 | 512 | 16 | 29 | 2683 | 1227 | 12280 | 6450 | 750 | 1500 | 29 | 30 | 12.2 | 16 | 10527 | 56 |
| Adrian College | Yes | 1428 | 1097 | 336 | 22 | 50 | 1036 | 99 | 11250 | 3750 | 400 | 1165 | 53 | 66 | 12.9 | 30 | 8735 | 54 |
| Agnes Scott College | Yes | 417 | 349 | 137 | 60 | 89 | 510 | 63 | 12960 | 5450 | 450 | 875 | 92 | 97 | 7.7 | 37 | 19016 | 59 |
| Alaska Pacific University | Yes | 193 | 146 | 55 | 16 | 44 | 249 | 869 | 7560 | 4120 | 800 | 1500 | 76 | 72 | 11.9 | 2 | 10922 | 15 |
| Albertson College | Yes | 587 | 479 | 158 | 38 | 62 | 678 | 41 | 13500 | 3335 | 500 | 675 | 67 | 73 | 9.4 | 11 | 9727 | 55 |
# 然后就是一些基本的统计
summary(college) #基本上就是提供每一列变量列的一些统计量
Private Apps Accept Enroll
Length:777 Min. : 81 Min. : 72 Min. : 35
Class :character 1st Qu.: 776 1st Qu.: 604 1st Qu.: 242
Mode :character Median : 1558 Median : 1110 Median : 434
Mean : 3002 Mean : 2019 Mean : 780
3rd Qu.: 3624 3rd Qu.: 2424 3rd Qu.: 902
Max. :48094 Max. :26330 Max. :6392
Top10perc Top25perc F.Undergrad P.Undergrad
Min. : 1.00 Min. : 9.0 Min. : 139 Min. : 1.0
1st Qu.:15.00 1st Qu.: 41.0 1st Qu.: 992 1st Qu.: 95.0
Median :23.00 Median : 54.0 Median : 1707 Median : 353.0
Mean :27.56 Mean : 55.8 Mean : 3700 Mean : 855.3
3rd Qu.:35.00 3rd Qu.: 69.0 3rd Qu.: 4005 3rd Qu.: 967.0
Max. :96.00 Max. :100.0 Max. :31643 Max. :21836.0
Outstate Room.Board Books Personal
Min. : 2340 Min. :1780 Min. : 96.0 Min. : 250
1st Qu.: 7320 1st Qu.:3597 1st Qu.: 470.0 1st Qu.: 850
Median : 9990 Median :4200 Median : 500.0 Median :1200
Mean :10441 Mean :4358 Mean : 549.4 Mean :1341
3rd Qu.:12925 3rd Qu.:5050 3rd Qu.: 600.0 3rd Qu.:1700
Max. :21700 Max. :8124 Max. :2340.0 Max. :6800
PhD Terminal S.F.Ratio perc.alumni
Min. : 8.00 Min. : 24.0 Min. : 2.50 Min. : 0.00
1st Qu.: 62.00 1st Qu.: 71.0 1st Qu.:11.50 1st Qu.:13.00
Median : 75.00 Median : 82.0 Median :13.60 Median :21.00
Mean : 72.66 Mean : 79.7 Mean :14.09 Mean :22.74
3rd Qu.: 85.00 3rd Qu.: 92.0 3rd Qu.:16.50 3rd Qu.:31.00
Max. :103.00 Max. :100.0 Max. :39.80 Max. :64.00
Expend Grad.Rate
Min. : 3186 Min. : 10.00
1st Qu.: 6751 1st Qu.: 53.00
Median : 8377 Median : 65.00
Mean : 9660 Mean : 65.46
3rd Qu.:10830 3rd Qu.: 78.00
Max. :56233 Max. :118.00
college <- read.csv("/data1/project/College.csv")
rownames(college) <- college[,1] #这是按照数字下标索引,我们采用这种
college <- college[, -1] # 去除第1列
college
| Private | Apps | Accept | Enroll | Top10perc | Top25perc | F.Undergrad | P.Undergrad | Outstate | Room.Board | Books | Personal | PhD | Terminal | S.F.Ratio | perc.alumni | Expend | Grad.Rate | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| <chr> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | <dbl> | <int> | <int> | <int> | |
| Abilene Christian University | Yes | 1660 | 1232 | 721 | 23 | 52 | 2885 | 537 | 7440 | 3300 | 450 | 2200 | 70 | 78 | 18.1 | 12 | 7041 | 60 |
| Adelphi University | Yes | 2186 | 1924 | 512 | 16 | 29 | 2683 | 1227 | 12280 | 6450 | 750 | 1500 | 29 | 30 | 12.2 | 16 | 10527 | 56 |
| Adrian College | Yes | 1428 | 1097 | 336 | 22 | 50 | 1036 | 99 | 11250 | 3750 | 400 | 1165 | 53 | 66 | 12.9 | 30 | 8735 | 54 |
| Agnes Scott College | Yes | 417 | 349 | 137 | 60 | 89 | 510 | 63 | 12960 | 5450 | 450 | 875 | 92 | 97 | 7.7 | 37 | 19016 | 59 |
| Alaska Pacific University | Yes | 193 | 146 | 55 | 16 | 44 | 249 | 869 | 7560 | 4120 | 800 | 1500 | 76 | 72 | 11.9 | 2 | 10922 | 15 |
| Albertson College | Yes | 587 | 479 | 158 | 38 | 62 | 678 | 41 | 13500 | 3335 | 500 | 675 | 67 | 73 | 9.4 | 11 | 9727 | 55 |
| Albertus Magnus College | Yes | 353 | 340 | 103 | 17 | 45 | 416 | 230 | 13290 | 5720 | 500 | 1500 | 90 | 93 | 11.5 | 26 | 8861 | 63 |
| Albion College | Yes | 1899 | 1720 | 489 | 37 | 68 | 1594 | 32 | 13868 | 4826 | 450 | 850 | 89 | 100 | 13.7 | 37 | 11487 | 73 |
| Albright College | Yes | 1038 | 839 | 227 | 30 | 63 | 973 | 306 | 15595 | 4400 | 300 | 500 | 79 | 84 | 11.3 | 23 | 11644 | 80 |
| Alderson-Broaddus College | Yes | 582 | 498 | 172 | 21 | 44 | 799 | 78 | 10468 | 3380 | 660 | 1800 | 40 | 41 | 11.5 | 15 | 8991 | 52 |
| Alfred University | Yes | 1732 | 1425 | 472 | 37 | 75 | 1830 | 110 | 16548 | 5406 | 500 | 600 | 82 | 88 | 11.3 | 31 | 10932 | 73 |
| Allegheny College | Yes | 2652 | 1900 | 484 | 44 | 77 | 1707 | 44 | 17080 | 4440 | 400 | 600 | 73 | 91 | 9.9 | 41 | 11711 | 76 |
| Allentown Coll. of St. Francis de Sales | Yes | 1179 | 780 | 290 | 38 | 64 | 1130 | 638 | 9690 | 4785 | 600 | 1000 | 60 | 84 | 13.3 | 21 | 7940 | 74 |
| Alma College | Yes | 1267 | 1080 | 385 | 44 | 73 | 1306 | 28 | 12572 | 4552 | 400 | 400 | 79 | 87 | 15.3 | 32 | 9305 | 68 |
| Alverno College | Yes | 494 | 313 | 157 | 23 | 46 | 1317 | 1235 | 8352 | 3640 | 650 | 2449 | 36 | 69 | 11.1 | 26 | 8127 | 55 |
| American International College | Yes | 1420 | 1093 | 220 | 9 | 22 | 1018 | 287 | 8700 | 4780 | 450 | 1400 | 78 | 84 | 14.7 | 19 | 7355 | 69 |
| Amherst College | Yes | 4302 | 992 | 418 | 83 | 96 | 1593 | 5 | 19760 | 5300 | 660 | 1598 | 93 | 98 | 8.4 | 63 | 21424 | 100 |
| Anderson University | Yes | 1216 | 908 | 423 | 19 | 40 | 1819 | 281 | 10100 | 3520 | 550 | 1100 | 48 | 61 | 12.1 | 14 | 7994 | 59 |
| Andrews University | Yes | 1130 | 704 | 322 | 14 | 23 | 1586 | 326 | 9996 | 3090 | 900 | 1320 | 62 | 66 | 11.5 | 18 | 10908 | 46 |
| Angelo State University | No | 3540 | 2001 | 1016 | 24 | 54 | 4190 | 1512 | 5130 | 3592 | 500 | 2000 | 60 | 62 | 23.1 | 5 | 4010 | 34 |
| Antioch University | Yes | 713 | 661 | 252 | 25 | 44 | 712 | 23 | 15476 | 3336 | 400 | 1100 | 69 | 82 | 11.3 | 35 | 42926 | 48 |
| Appalachian State University | No | 7313 | 4664 | 1910 | 20 | 63 | 9940 | 1035 | 6806 | 2540 | 96 | 2000 | 83 | 96 | 18.3 | 14 | 5854 | 70 |
| Aquinas College | Yes | 619 | 516 | 219 | 20 | 51 | 1251 | 767 | 11208 | 4124 | 350 | 1615 | 55 | 65 | 12.7 | 25 | 6584 | 65 |
| Arizona State University Main campus | No | 12809 | 10308 | 3761 | 24 | 49 | 22593 | 7585 | 7434 | 4850 | 700 | 2100 | 88 | 93 | 18.9 | 5 | 4602 | 48 |
| Arkansas College (Lyon College) | Yes | 708 | 334 | 166 | 46 | 74 | 530 | 182 | 8644 | 3922 | 500 | 800 | 79 | 88 | 12.6 | 24 | 14579 | 54 |
| Arkansas Tech University | No | 1734 | 1729 | 951 | 12 | 52 | 3602 | 939 | 3460 | 2650 | 450 | 1000 | 57 | 60 | 19.6 | 5 | 4739 | 48 |
| Assumption College | Yes | 2135 | 1700 | 491 | 23 | 59 | 1708 | 689 | 12000 | 5920 | 500 | 500 | 93 | 93 | 13.8 | 30 | 7100 | 88 |
| Auburn University-Main Campus | No | 7548 | 6791 | 3070 | 25 | 57 | 16262 | 1716 | 6300 | 3933 | 600 | 1908 | 85 | 91 | 16.7 | 18 | 6642 | 69 |
| Augsburg College | Yes | 662 | 513 | 257 | 12 | 30 | 2074 | 726 | 11902 | 4372 | 540 | 950 | 65 | 65 | 12.8 | 31 | 7836 | 58 |
| Augustana College IL | Yes | 1879 | 1658 | 497 | 36 | 69 | 1950 | 38 | 13353 | 4173 | 540 | 821 | 78 | 83 | 12.7 | 40 | 9220 | 71 |
| ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
| Westfield State College | No | 3100 | 2150 | 825 | 3 | 20 | 3234 | 941 | 5542 | 3788 | 500 | 1300 | 75 | 79 | 15.7 | 20 | 4222 | 65 |
| Westminster College MO | Yes | 662 | 553 | 184 | 20 | 43 | 665 | 37 | 10720 | 4050 | 600 | 1650 | 66 | 70 | 12.5 | 20 | 7925 | 62 |
| Westminster College | Yes | 996 | 866 | 377 | 29 | 58 | 1411 | 72 | 12065 | 3615 | 430 | 685 | 62 | 78 | 12.5 | 41 | 8596 | 80 |
| Westminster College of Salt Lake City | Yes | 917 | 720 | 213 | 21 | 60 | 979 | 743 | 8820 | 4050 | 600 | 2025 | 68 | 83 | 10.5 | 34 | 7170 | 50 |
| Westmont College | No | 950 | 713 | 351 | 42 | 72 | 1276 | 9 | 14320 | 5304 | 490 | 1410 | 77 | 77 | 14.9 | 17 | 8837 | 87 |
| Wheaton College IL | Yes | 1432 | 920 | 548 | 56 | 84 | 2200 | 56 | 11480 | 4200 | 530 | 1400 | 81 | 83 | 12.7 | 40 | 11916 | 85 |
| Westminster College PA | Yes | 1738 | 1373 | 417 | 21 | 55 | 1335 | 30 | 18460 | 5970 | 700 | 850 | 92 | 96 | 13.2 | 41 | 22704 | 71 |
| Wheeling Jesuit College | Yes | 903 | 755 | 213 | 15 | 49 | 971 | 305 | 10500 | 4545 | 600 | 600 | 66 | 71 | 14.1 | 27 | 7494 | 72 |
| Whitman College | Yes | 1861 | 998 | 359 | 45 | 77 | 1220 | 46 | 16670 | 4900 | 750 | 800 | 80 | 83 | 10.5 | 51 | 13198 | 72 |
| Whittier College | Yes | 1681 | 1069 | 344 | 35 | 63 | 1235 | 30 | 16249 | 5699 | 500 | 1998 | 84 | 92 | 13.6 | 29 | 11778 | 52 |
| Whitworth College | Yes | 1121 | 926 | 372 | 43 | 70 | 1270 | 160 | 12660 | 4500 | 678 | 2424 | 80 | 80 | 16.9 | 20 | 8328 | 80 |
| Widener University | Yes | 2139 | 1492 | 502 | 24 | 64 | 2186 | 2171 | 12350 | 5370 | 500 | 1350 | 88 | 86 | 12.6 | 19 | 9603 | 63 |
| Wilkes University | Yes | 1631 | 1431 | 434 | 15 | 36 | 1803 | 603 | 11150 | 5130 | 550 | 1260 | 78 | 92 | 13.3 | 24 | 8543 | 67 |
| Willamette University | Yes | 1658 | 1327 | 395 | 49 | 80 | 1595 | 159 | 14800 | 4620 | 400 | 790 | 91 | 94 | 13.3 | 37 | 10779 | 68 |
| William Jewell College | Yes | 663 | 547 | 315 | 32 | 67 | 1279 | 75 | 10060 | 2970 | 500 | 2600 | 74 | 80 | 11.2 | 19 | 7885 | 59 |
| William Woods University | Yes | 469 | 435 | 227 | 17 | 39 | 851 | 120 | 10535 | 4365 | 550 | 3700 | 39 | 66 | 12.9 | 16 | 7438 | 52 |
| Williams College | Yes | 4186 | 1245 | 526 | 81 | 96 | 1988 | 29 | 19629 | 5790 | 500 | 1200 | 94 | 99 | 9.0 | 64 | 22014 | 99 |
| Wilson College | Yes | 167 | 130 | 46 | 16 | 50 | 199 | 676 | 11428 | 5084 | 450 | 475 | 67 | 76 | 8.3 | 43 | 10291 | 67 |
| Wingate College | Yes | 1239 | 1017 | 383 | 10 | 34 | 1207 | 157 | 7820 | 3400 | 550 | 1550 | 69 | 81 | 13.9 | 8 | 7264 | 91 |
| Winona State University | No | 3325 | 2047 | 1301 | 20 | 45 | 5800 | 872 | 4200 | 2700 | 300 | 1200 | 53 | 60 | 20.2 | 18 | 5318 | 58 |
| Winthrop University | No | 2320 | 1805 | 769 | 24 | 61 | 3395 | 670 | 6400 | 3392 | 580 | 2150 | 71 | 80 | 12.8 | 26 | 6729 | 59 |
| Wisconsin Lutheran College | Yes | 152 | 128 | 75 | 17 | 41 | 282 | 22 | 9100 | 3700 | 500 | 1400 | 48 | 48 | 8.5 | 26 | 8960 | 50 |
| Wittenberg University | Yes | 1979 | 1739 | 575 | 42 | 68 | 1980 | 144 | 15948 | 4404 | 400 | 800 | 82 | 95 | 12.8 | 29 | 10414 | 78 |
| Wofford College | Yes | 1501 | 935 | 273 | 51 | 83 | 1059 | 34 | 12680 | 4150 | 605 | 1440 | 91 | 92 | 15.3 | 42 | 7875 | 75 |
| Worcester Polytechnic Institute | Yes | 2768 | 2314 | 682 | 49 | 86 | 2802 | 86 | 15884 | 5370 | 530 | 730 | 92 | 94 | 15.2 | 34 | 10774 | 82 |
| Worcester State College | No | 2197 | 1515 | 543 | 4 | 26 | 3089 | 2029 | 6797 | 3900 | 500 | 1200 | 60 | 60 | 21.0 | 14 | 4469 | 40 |
| Xavier University | Yes | 1959 | 1805 | 695 | 24 | 47 | 2849 | 1107 | 11520 | 4960 | 600 | 1250 | 73 | 75 | 13.3 | 31 | 9189 | 83 |
| Xavier University of Louisiana | Yes | 2097 | 1915 | 695 | 34 | 61 | 2793 | 166 | 6900 | 4200 | 617 | 781 | 67 | 75 | 14.4 | 20 | 8323 | 49 |
| Yale University | Yes | 10705 | 2453 | 1317 | 95 | 99 | 5217 | 83 | 19840 | 6510 | 630 | 2115 | 96 | 96 | 5.8 | 49 | 40386 | 99 |
| York College of Pennsylvania | Yes | 2989 | 1855 | 691 | 28 | 63 | 2988 | 1726 | 4990 | 3560 | 500 | 1250 | 75 | 75 | 18.1 | 28 | 4509 | 99 |
# c
# college[,1:10]
college <- read.csv("/data1/project/College.csv")
rownames(college) <- college[,1] #这是按照数字下标索引,我们采用这种
college <- college[, -1] # 去除第1列
# 因为private是chr列,所以建议去除private列再绘制相关性矩阵
# college[,-1][,1:10] #这样就去除了private列
pairs(college[,-1][,1:10]) # 画出pairwise scatter plot,其实就是两两变量之间的成对散点图矩阵
# vscode中无法查看,所产生图截图之后在md中展示
# 简单绘图,绘制箱线图
boxplot(Outstate~Private,data=college)
# 后来发现不是因为jupyter渲染的问题,而是theme的问题,选择dark的话前面以及后面绘制的图就会变成黑色的背景,所以我们需要将theme设置为light
后来发现不是因为jupyter渲染的问题,而是theme的问题,选择dark的话前面以及后面绘制的图就会变成黑色的背景,所以我们需要将theme设置为light
不过jupyter在网页版打开应该是一致的,能够看见
# library(tidyverse)
college %>% ggplot(mapping = aes(y=Outstate, x=Private)) + geom_boxplot()
# ggplot系列的不改theme的话,背景是白色的,所以我们可以直接使用ggplot
Elite <- rep("No", nrow(college)) #初始化为No,college的行数次重复,数据类型同No,也是字符型
# class(rep("No", nrow(college)) ) # "character"
Elite[college$Top10perc > 50] <- "Yes" # 依赖于向量与数据帧行数的一一对应关系,实际上就是筛选index行数
Elite <- as.factor(Elite) # 转化为factor
college <- data.frame(college, Elite) # 再转化为dataframe
head(college)
| Private | Apps | Accept | Enroll | Top10perc | Top25perc | F.Undergrad | P.Undergrad | Outstate | Room.Board | Books | Personal | PhD | Terminal | S.F.Ratio | perc.alumni | Expend | Grad.Rate | Elite | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| <chr> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | <dbl> | <int> | <int> | <int> | <fct> | |
| Abilene Christian University | Yes | 1660 | 1232 | 721 | 23 | 52 | 2885 | 537 | 7440 | 3300 | 450 | 2200 | 70 | 78 | 18.1 | 12 | 7041 | 60 | No |
| Adelphi University | Yes | 2186 | 1924 | 512 | 16 | 29 | 2683 | 1227 | 12280 | 6450 | 750 | 1500 | 29 | 30 | 12.2 | 16 | 10527 | 56 | No |
| Adrian College | Yes | 1428 | 1097 | 336 | 22 | 50 | 1036 | 99 | 11250 | 3750 | 400 | 1165 | 53 | 66 | 12.9 | 30 | 8735 | 54 | No |
| Agnes Scott College | Yes | 417 | 349 | 137 | 60 | 89 | 510 | 63 | 12960 | 5450 | 450 | 875 | 92 | 97 | 7.7 | 37 | 19016 | 59 | Yes |
| Alaska Pacific University | Yes | 193 | 146 | 55 | 16 | 44 | 249 | 869 | 7560 | 4120 | 800 | 1500 | 76 | 72 | 11.9 | 2 | 10922 | 15 | No |
| Albertson College | Yes | 587 | 479 | 158 | 38 | 62 | 678 | 41 | 13500 | 3335 | 500 | 675 | 67 | 73 | 9.4 | 11 | 9727 | 55 | No |
# 当然上面的操作也可以使用tidyverse完成,比如说我们有新的1列,称之为elite
college %>% mutate(elite = ifelse(Top10perc > 50, "Yes", "No")) %>% head()
| Private | Apps | Accept | Enroll | Top10perc | Top25perc | F.Undergrad | P.Undergrad | Outstate | Room.Board | Books | Personal | PhD | Terminal | S.F.Ratio | perc.alumni | Expend | Grad.Rate | Elite | elite | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| <chr> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | <dbl> | <int> | <int> | <int> | <fct> | <chr> | |
| Abilene Christian University | Yes | 1660 | 1232 | 721 | 23 | 52 | 2885 | 537 | 7440 | 3300 | 450 | 2200 | 70 | 78 | 18.1 | 12 | 7041 | 60 | No | No |
| Adelphi University | Yes | 2186 | 1924 | 512 | 16 | 29 | 2683 | 1227 | 12280 | 6450 | 750 | 1500 | 29 | 30 | 12.2 | 16 | 10527 | 56 | No | No |
| Adrian College | Yes | 1428 | 1097 | 336 | 22 | 50 | 1036 | 99 | 11250 | 3750 | 400 | 1165 | 53 | 66 | 12.9 | 30 | 8735 | 54 | No | No |
| Agnes Scott College | Yes | 417 | 349 | 137 | 60 | 89 | 510 | 63 | 12960 | 5450 | 450 | 875 | 92 | 97 | 7.7 | 37 | 19016 | 59 | Yes | Yes |
| Alaska Pacific University | Yes | 193 | 146 | 55 | 16 | 44 | 249 | 869 | 7560 | 4120 | 800 | 1500 | 76 | 72 | 11.9 | 2 | 10922 | 15 | No | No |
| Albertson College | Yes | 587 | 479 | 158 | 38 | 62 | 678 | 41 | 13500 | 3335 | 500 | 675 | 67 | 73 | 9.4 | 11 | 9727 | 55 | No | No |
summary(college$Elite)
boxplot(Outstate~Elite,college) # 建议还是使用ggplot出图
college %>% ggplot(mapping = aes(y=Outstate, x=Elite)) + geom_boxplot()
- No
- 699
- Yes
- 78
# 然后就是直方图的展示了
par(mfrow=c(2,2))
hist(college$Apps)
hist(college$Accept)
hist(college$Enroll)
hist(college$PhD)
# 建议使用ggplot
college %>% ggplot(mapping = aes(x=Apps)) + geom_histogram()
college %>% ggplot(mapping = aes(x=Accept)) + geom_histogram()
college %>% ggplot(mapping = aes(x=Enroll)) + geom_histogram()
college %>% ggplot(mapping = aes(x=PhD)) + geom_histogram()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
查看的直方图无非是申请人数,接受人数,入学人数,以及获得学校教师有无博士学位
前者逐渐递减,说明上学难,后者教育水平还可以看出来
# 自由探索,其实也无非就是tidyverse对一些变量分分组,然后看其他变量统计指标,再做做检验等
# 比如说公立学校和私立学校,师资水平如何,就是含有PhD的比例
college %>% group_by(Private) %>% summarise(mean(PhD)) #私立学校的PhD比例更低,有点出乎意料
college %>% group_by(Private) %>% summarise(mean(PhD)) %>% ggplot(mapping=aes(x=Private,y=`mean(PhD)`)) + geom_bar(stat='identity') #做一个检验
# 或者用t检验
t.test(college$PhD~college$Private)
| Private | mean(PhD) |
|---|---|
| <chr> | <dbl> |
| No | 76.83491 |
| Yes | 71.09381 |
Welch Two Sample t-test
data: college$PhD by college$Private
t = 5.1381, df = 531.86, p-value = 3.904e-07
alternative hypothesis: true difference in means between group No and group Yes is not equal to 0
95 percent confidence interval:
3.546110 7.936091
sample estimates:
mean in group No mean in group Yes
76.83491 71.09381
# 我们再来看看精英学校和非精英学校的PhD比例,以及接受率(也就是申请的人数里有多少人接受了,大概这样)的差异
college %>% group_by(Elite) %>% summarise(mean(PhD)) #精英学校的PhD比例更高,这个是符合预期的
college %>% group_by(Elite) %>% summarise(mean(PhD)) %>% ggplot(mapping=aes(x=Elite,y=`mean(PhD)`)) + geom_bar(stat='identity')
college %>% group_by(Elite) %>% summarise(mean(Accept/Enroll)) #精英学校的接受率更高,这个可能是因为生源很棒,所以接受率更高
| Elite | mean(PhD) |
|---|---|
| <fct> | <dbl> |
| No | 70.80114 |
| Yes | 89.32051 |
| Elite | mean(Accept/Enroll) |
|---|---|
| <fct> | <dbl> |
| No | 2.660999 |
| Yes | 2.873612 |
10,
# a
# install.packages("ISLR2")
library(ISLR2)
head(Boston)
# ?Boston
# A data.frame: 506 × 13
# 其实也可以使用dim来查看
dim(Boston)
| crim | zn | indus | chas | nox | rm | age | dis | rad | tax | ptratio | lstat | medv | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| <dbl> | <dbl> | <dbl> | <int> | <dbl> | <dbl> | <dbl> | <dbl> | <int> | <dbl> | <dbl> | <dbl> | <dbl> | |
| 1 | 0.00632 | 18 | 2.31 | 0 | 0.538 | 6.575 | 65.2 | 4.0900 | 1 | 296 | 15.3 | 4.98 | 24.0 |
| 2 | 0.02731 | 0 | 7.07 | 0 | 0.469 | 6.421 | 78.9 | 4.9671 | 2 | 242 | 17.8 | 9.14 | 21.6 |
| 3 | 0.02729 | 0 | 7.07 | 0 | 0.469 | 7.185 | 61.1 | 4.9671 | 2 | 242 | 17.8 | 4.03 | 34.7 |
| 4 | 0.03237 | 0 | 2.18 | 0 | 0.458 | 6.998 | 45.8 | 6.0622 | 3 | 222 | 18.7 | 2.94 | 33.4 |
| 5 | 0.06905 | 0 | 2.18 | 0 | 0.458 | 7.147 | 54.2 | 6.0622 | 3 | 222 | 18.7 | 5.33 | 36.2 |
| 6 | 0.02985 | 0 | 2.18 | 0 | 0.458 | 6.430 | 58.7 | 6.0622 | 3 | 222 | 18.7 | 5.21 | 28.7 |
- 506
- 13
# b
# 又是绘制配对散点图,pairs
glimpse(Boston) # 都是dbl,数字型,那用pairs没问题了
pairs(Boston)
Rows: 506 Columns: 13 $ crim <dbl> 0.00632, 0.02731, 0.02729, 0.03237, 0.06905, 0.02985, 0.08829,… $ zn <dbl> 18.0, 0.0, 0.0, 0.0, 0.0, 0.0, 12.5, 12.5, 12.5, 12.5, 12.5, 1… $ indus <dbl> 2.31, 7.07, 7.07, 2.18, 2.18, 2.18, 7.87, 7.87, 7.87, 7.87, 7.… $ chas <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,… $ nox <dbl> 0.538, 0.469, 0.469, 0.458, 0.458, 0.458, 0.524, 0.524, 0.524,… $ rm <dbl> 6.575, 6.421, 7.185, 6.998, 7.147, 6.430, 6.012, 6.172, 5.631,… $ age <dbl> 65.2, 78.9, 61.1, 45.8, 54.2, 58.7, 66.6, 96.1, 100.0, 85.9, 9… $ dis <dbl> 4.0900, 4.9671, 4.9671, 6.0622, 6.0622, 6.0622, 5.5605, 5.9505… $ rad <int> 1, 2, 2, 3, 3, 3, 5, 5, 5, 5, 5, 5, 5, 4, 4, 4, 4, 4, 4, 4, 4,… $ tax <dbl> 296, 242, 242, 222, 222, 222, 311, 311, 311, 311, 311, 311, 31… $ ptratio <dbl> 15.3, 17.8, 17.8, 18.7, 18.7, 18.7, 15.2, 15.2, 15.2, 15.2, 15… $ lstat <dbl> 4.98, 9.14, 4.03, 2.94, 5.33, 5.21, 12.43, 19.15, 29.93, 17.10… $ medv <dbl> 24.0, 21.6, 34.7, 33.4, 36.2, 28.7, 22.9, 27.1, 16.5, 18.9, 15…
# c
# 事实上,上面的图太糊了,我们最好是挑选一些var变量来查看,并且最好使用ggplot
# 我们可以挑选一些指标,比如说是crim和dis、rad、ptratio等
Boston %>% ggplot(mapping=aes(x = dis,y = crim))+geom_point() #这个是dis和crim的散点图,说明离开就业中心越远也就是越不发达的地区,犯罪率越低
Boston %>% ggplot(mapping=aes(x = rad,y = crim))+geom_point() # 这个是rad和crim的散点图,没啥规律,但是交通越发达地区,犯罪率一般很高
Boston %>% ggplot(mapping=aes(x = ptratio,y = crim))+geom_point() #生师比高的地方,也就是教育资源分配不均的地方,犯罪率也高
# 有很多变量都可以拿来和crim比较查看
# ?Boston # ‘crim’ per capita crime rate by town. 城镇的人均犯罪率
# particularly high crime rates,怎么衡量,翻过一件罪行的认为是高犯罪率,那么我们可以看看crim的分布
head(Boston) # 506 suburbs of Boston 波士顿506个郊区
| crim | zn | indus | chas | nox | rm | age | dis | rad | tax | ptratio | lstat | medv | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| <dbl> | <dbl> | <dbl> | <int> | <dbl> | <dbl> | <dbl> | <dbl> | <int> | <dbl> | <dbl> | <dbl> | <dbl> | |
| 1 | 0.00632 | 18 | 2.31 | 0 | 0.538 | 6.575 | 65.2 | 4.0900 | 1 | 296 | 15.3 | 4.98 | 24.0 |
| 2 | 0.02731 | 0 | 7.07 | 0 | 0.469 | 6.421 | 78.9 | 4.9671 | 2 | 242 | 17.8 | 9.14 | 21.6 |
| 3 | 0.02729 | 0 | 7.07 | 0 | 0.469 | 7.185 | 61.1 | 4.9671 | 2 | 242 | 17.8 | 4.03 | 34.7 |
| 4 | 0.03237 | 0 | 2.18 | 0 | 0.458 | 6.998 | 45.8 | 6.0622 | 3 | 222 | 18.7 | 2.94 | 33.4 |
| 5 | 0.06905 | 0 | 2.18 | 0 | 0.458 | 7.147 | 54.2 | 6.0622 | 3 | 222 | 18.7 | 5.33 | 36.2 |
| 6 | 0.02985 | 0 | 2.18 | 0 | 0.458 | 6.430 | 58.7 | 6.0622 | 3 | 222 | 18.7 | 5.21 | 28.7 |
# d
Boston %>% ggplot(mapping=aes(x = crim))+geom_histogram()
Boston %>% ggplot(mapping=aes(x = crim))+geom_histogram(bins = 10) # 大部分郊区还是低犯罪率的,但是有些郊区数值在75以上,人均了
Boston %>% ggplot(mapping=aes(x = tax))+geom_histogram() #税率普遍比较低
Boston %>% ggplot(mapping=aes(x = ptratio))+geom_histogram() #生师比普遍比较高,其实我们应该看师生比,也就是学生多少个老师,这个比例更好
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# e
Boston %>% filter(chas==1) %>% nrow() # 一共是35个地方
# f
Boston %>% select(ptratio) %>% summary() # 可以看得出来median是19.05,最大值是22,最小值是12.6,所以这个指标还是比较均衡的
ptratio Min. :12.60 1st Qu.:17.40 Median :19.05 Mean :18.46 3rd Qu.:20.20 Max. :22.00
# g
# ?Boston ‘medv’ median value of owner-occupied homes in $1000s. 自有住房价值中位数
Boston %>% filter(medv == min(medv)) #有2个城市,但是行index应该可以通过其他方法查看
subset(Boston,medv==min(Boston$medv)) # 可以看出来是第399和406行
| crim | zn | indus | chas | nox | rm | age | dis | rad | tax | ptratio | lstat | medv |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| <dbl> | <dbl> | <dbl> | <int> | <dbl> | <dbl> | <dbl> | <dbl> | <int> | <dbl> | <dbl> | <dbl> | <dbl> |
| 38.3518 | 0 | 18.1 | 0 | 0.693 | 5.453 | 100 | 1.4896 | 24 | 666 | 20.2 | 30.59 | 5 |
| 67.9208 | 0 | 18.1 | 0 | 0.693 | 5.683 | 100 | 1.4254 | 24 | 666 | 20.2 | 22.98 | 5 |
| crim | zn | indus | chas | nox | rm | age | dis | rad | tax | ptratio | lstat | medv | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| <dbl> | <dbl> | <dbl> | <int> | <dbl> | <dbl> | <dbl> | <dbl> | <int> | <dbl> | <dbl> | <dbl> | <dbl> | |
| 399 | 38.3518 | 0 | 18.1 | 0 | 0.693 | 5.453 | 100 | 1.4896 | 24 | 666 | 20.2 | 30.59 | 5 |
| 406 | 67.9208 | 0 | 18.1 | 0 | 0.693 | 5.683 | 100 | 1.4254 | 24 | 666 | 20.2 | 22.98 | 5 |
# h
# ?Boston # ‘rm’ average number of rooms per dwelling. 每套住宅的平均房间数
Boston %>% filter(rm>7) # 64个是超过7个房间的
Boston %>% filter(rm>8) # 13个是超过8个房间的
# 然后就是一些统计
Boston %>% filter(rm>8) %>% summary()
# 可以和全面的数据比较
Boston %>% summary()
# 我们可以只比较mean均值
# crim更低,也就是犯罪率低,其他指标同样可以比较均值看出
| crim | zn | indus | chas | nox | rm | age | dis | rad | tax | ptratio | lstat | medv |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| <dbl> | <dbl> | <dbl> | <int> | <dbl> | <dbl> | <dbl> | <dbl> | <int> | <dbl> | <dbl> | <dbl> | <dbl> |
| 0.02729 | 0.0 | 7.07 | 0 | 0.4690 | 7.185 | 61.1 | 4.9671 | 2 | 242 | 17.8 | 4.03 | 34.7 |
| 0.06905 | 0.0 | 2.18 | 0 | 0.4580 | 7.147 | 54.2 | 6.0622 | 3 | 222 | 18.7 | 5.33 | 36.2 |
| 0.03359 | 75.0 | 2.95 | 0 | 0.4280 | 7.024 | 15.8 | 5.4011 | 3 | 252 | 18.3 | 1.98 | 34.9 |
| 0.01311 | 90.0 | 1.22 | 0 | 0.4030 | 7.249 | 21.9 | 8.6966 | 5 | 226 | 17.9 | 4.81 | 35.4 |
| 0.01951 | 17.5 | 1.38 | 0 | 0.4161 | 7.104 | 59.5 | 9.2229 | 3 | 216 | 18.6 | 8.05 | 33.0 |
| 0.05660 | 0.0 | 3.41 | 0 | 0.4890 | 7.007 | 86.3 | 3.4217 | 2 | 270 | 17.8 | 5.50 | 23.6 |
| 0.05302 | 0.0 | 3.41 | 0 | 0.4890 | 7.079 | 63.1 | 3.4145 | 2 | 270 | 17.8 | 5.70 | 28.7 |
| 0.12083 | 0.0 | 2.89 | 0 | 0.4450 | 8.069 | 76.0 | 3.4952 | 2 | 276 | 18.0 | 4.21 | 38.7 |
| 0.08187 | 0.0 | 2.89 | 0 | 0.4450 | 7.820 | 36.9 | 3.4952 | 2 | 276 | 18.0 | 3.57 | 43.8 |
| 0.06860 | 0.0 | 2.89 | 0 | 0.4450 | 7.416 | 62.5 | 3.4952 | 2 | 276 | 18.0 | 6.19 | 33.2 |
| 1.46336 | 0.0 | 19.58 | 0 | 0.6050 | 7.489 | 90.8 | 1.9709 | 5 | 403 | 14.7 | 1.73 | 50.0 |
| 1.83377 | 0.0 | 19.58 | 1 | 0.6050 | 7.802 | 98.2 | 2.0407 | 5 | 403 | 14.7 | 1.92 | 50.0 |
| 1.51902 | 0.0 | 19.58 | 1 | 0.6050 | 8.375 | 93.9 | 2.1620 | 5 | 403 | 14.7 | 3.32 | 50.0 |
| 2.01019 | 0.0 | 19.58 | 0 | 0.6050 | 7.929 | 96.2 | 2.0459 | 5 | 403 | 14.7 | 3.70 | 50.0 |
| 0.06588 | 0.0 | 2.46 | 0 | 0.4880 | 7.765 | 83.3 | 2.7410 | 3 | 193 | 17.8 | 7.56 | 39.8 |
| 0.09103 | 0.0 | 2.46 | 0 | 0.4880 | 7.155 | 92.2 | 2.7006 | 3 | 193 | 17.8 | 4.82 | 37.9 |
| 0.05602 | 0.0 | 2.46 | 0 | 0.4880 | 7.831 | 53.6 | 3.1992 | 3 | 193 | 17.8 | 4.45 | 50.0 |
| 0.08370 | 45.0 | 3.44 | 0 | 0.4370 | 7.185 | 38.9 | 4.5667 | 5 | 398 | 15.2 | 5.39 | 34.9 |
| 0.08664 | 45.0 | 3.44 | 0 | 0.4370 | 7.178 | 26.3 | 6.4798 | 5 | 398 | 15.2 | 2.87 | 36.4 |
| 0.01381 | 80.0 | 0.46 | 0 | 0.4220 | 7.875 | 32.0 | 5.6484 | 4 | 255 | 14.4 | 2.97 | 50.0 |
| 0.04011 | 80.0 | 1.52 | 0 | 0.4040 | 7.287 | 34.1 | 7.3090 | 2 | 329 | 12.6 | 4.08 | 33.3 |
| 0.04666 | 80.0 | 1.52 | 0 | 0.4040 | 7.107 | 36.6 | 7.3090 | 2 | 329 | 12.6 | 8.61 | 30.3 |
| 0.03768 | 80.0 | 1.52 | 0 | 0.4040 | 7.274 | 38.3 | 7.3090 | 2 | 329 | 12.6 | 6.62 | 34.6 |
| 0.01778 | 95.0 | 1.47 | 0 | 0.4030 | 7.135 | 13.9 | 7.6534 | 3 | 402 | 17.0 | 4.45 | 32.9 |
| 0.02177 | 82.5 | 2.03 | 0 | 0.4150 | 7.610 | 15.7 | 6.2700 | 2 | 348 | 14.7 | 3.11 | 42.3 |
| 0.03510 | 95.0 | 2.68 | 0 | 0.4161 | 7.853 | 33.2 | 5.1180 | 4 | 224 | 14.7 | 3.81 | 48.5 |
| 0.02009 | 95.0 | 2.68 | 0 | 0.4161 | 8.034 | 31.9 | 5.1180 | 4 | 224 | 14.7 | 2.88 | 50.0 |
| 0.31533 | 0.0 | 6.20 | 0 | 0.5040 | 8.266 | 78.3 | 2.8944 | 8 | 307 | 17.4 | 4.14 | 44.8 |
| 0.52693 | 0.0 | 6.20 | 0 | 0.5040 | 8.725 | 83.0 | 2.8944 | 8 | 307 | 17.4 | 4.63 | 50.0 |
| 0.38214 | 0.0 | 6.20 | 0 | 0.5040 | 8.040 | 86.5 | 3.2157 | 8 | 307 | 17.4 | 3.13 | 37.6 |
| ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
| 0.33147 | 0 | 6.20 | 0 | 0.5070 | 8.247 | 70.4 | 3.6519 | 8 | 307 | 17.4 | 3.95 | 48.3 |
| 0.51183 | 0 | 6.20 | 0 | 0.5070 | 7.358 | 71.6 | 4.1480 | 8 | 307 | 17.4 | 4.73 | 31.5 |
| 0.36894 | 22 | 5.86 | 0 | 0.4310 | 8.259 | 8.4 | 8.9067 | 7 | 330 | 19.1 | 3.54 | 42.8 |
| 0.01538 | 90 | 3.75 | 0 | 0.3940 | 7.454 | 34.2 | 6.3361 | 3 | 244 | 15.9 | 3.11 | 44.0 |
| 0.61154 | 20 | 3.97 | 0 | 0.6470 | 8.704 | 86.9 | 1.8010 | 5 | 264 | 13.0 | 5.12 | 50.0 |
| 0.66351 | 20 | 3.97 | 0 | 0.6470 | 7.333 | 100.0 | 1.8946 | 5 | 264 | 13.0 | 7.79 | 36.0 |
| 0.54011 | 20 | 3.97 | 0 | 0.6470 | 7.203 | 81.8 | 2.1121 | 5 | 264 | 13.0 | 9.59 | 33.8 |
| 0.53412 | 20 | 3.97 | 0 | 0.6470 | 7.520 | 89.4 | 2.1398 | 5 | 264 | 13.0 | 7.26 | 43.1 |
| 0.52014 | 20 | 3.97 | 0 | 0.6470 | 8.398 | 91.5 | 2.2885 | 5 | 264 | 13.0 | 5.91 | 48.8 |
| 0.82526 | 20 | 3.97 | 0 | 0.6470 | 7.327 | 94.5 | 2.0788 | 5 | 264 | 13.0 | 11.25 | 31.0 |
| 0.55007 | 20 | 3.97 | 0 | 0.6470 | 7.206 | 91.6 | 1.9301 | 5 | 264 | 13.0 | 8.10 | 36.5 |
| 0.78570 | 20 | 3.97 | 0 | 0.6470 | 7.014 | 84.6 | 2.1329 | 5 | 264 | 13.0 | 14.79 | 30.7 |
| 0.57834 | 20 | 3.97 | 0 | 0.5750 | 8.297 | 67.0 | 2.4216 | 5 | 264 | 13.0 | 7.44 | 50.0 |
| 0.54050 | 20 | 3.97 | 0 | 0.5750 | 7.470 | 52.6 | 2.8720 | 5 | 264 | 13.0 | 3.16 | 43.5 |
| 0.22188 | 20 | 6.96 | 1 | 0.4640 | 7.691 | 51.8 | 4.3665 | 3 | 223 | 18.6 | 6.58 | 35.2 |
| 0.10469 | 40 | 6.41 | 1 | 0.4470 | 7.267 | 49.0 | 4.7872 | 4 | 254 | 17.6 | 6.05 | 33.2 |
| 0.03578 | 20 | 3.33 | 0 | 0.4429 | 7.820 | 64.5 | 4.6947 | 5 | 216 | 14.9 | 3.76 | 45.4 |
| 0.06129 | 20 | 3.33 | 1 | 0.4429 | 7.645 | 49.7 | 5.2119 | 5 | 216 | 14.9 | 3.01 | 46.0 |
| 0.01501 | 90 | 1.21 | 1 | 0.4010 | 7.923 | 24.8 | 5.8850 | 1 | 198 | 13.6 | 3.16 | 50.0 |
| 0.00906 | 90 | 2.97 | 0 | 0.4000 | 7.088 | 20.8 | 7.3073 | 1 | 285 | 15.3 | 7.85 | 32.2 |
| 0.07886 | 80 | 4.95 | 0 | 0.4110 | 7.148 | 27.7 | 5.1167 | 4 | 245 | 19.2 | 3.56 | 37.3 |
| 0.05561 | 70 | 2.24 | 0 | 0.4000 | 7.041 | 10.0 | 7.8278 | 5 | 358 | 14.8 | 4.74 | 29.0 |
| 0.05515 | 33 | 2.18 | 0 | 0.4720 | 7.236 | 41.1 | 4.0220 | 7 | 222 | 18.4 | 6.93 | 36.1 |
| 0.07503 | 33 | 2.18 | 0 | 0.4720 | 7.420 | 71.9 | 3.0992 | 7 | 222 | 18.4 | 6.47 | 33.4 |
| 0.01301 | 35 | 1.52 | 0 | 0.4420 | 7.241 | 49.3 | 7.0379 | 1 | 284 | 15.5 | 5.49 | 32.7 |
| 3.47428 | 0 | 18.10 | 1 | 0.7180 | 8.780 | 82.9 | 1.9047 | 24 | 666 | 20.2 | 5.29 | 21.9 |
| 6.53876 | 0 | 18.10 | 1 | 0.6310 | 7.016 | 97.5 | 1.2024 | 24 | 666 | 20.2 | 2.96 | 50.0 |
| 19.60910 | 0 | 18.10 | 0 | 0.6710 | 7.313 | 97.9 | 1.3163 | 24 | 666 | 20.2 | 13.44 | 15.0 |
| 8.24809 | 0 | 18.10 | 0 | 0.7130 | 7.393 | 99.3 | 2.4527 | 24 | 666 | 20.2 | 16.74 | 17.8 |
| 5.73116 | 0 | 18.10 | 0 | 0.5320 | 7.061 | 77.0 | 3.4106 | 24 | 666 | 20.2 | 7.01 | 25.0 |
| crim | zn | indus | chas | nox | rm | age | dis | rad | tax | ptratio | lstat | medv |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| <dbl> | <dbl> | <dbl> | <int> | <dbl> | <dbl> | <dbl> | <dbl> | <int> | <dbl> | <dbl> | <dbl> | <dbl> |
| 0.12083 | 0 | 2.89 | 0 | 0.4450 | 8.069 | 76.0 | 3.4952 | 2 | 276 | 18.0 | 4.21 | 38.7 |
| 1.51902 | 0 | 19.58 | 1 | 0.6050 | 8.375 | 93.9 | 2.1620 | 5 | 403 | 14.7 | 3.32 | 50.0 |
| 0.02009 | 95 | 2.68 | 0 | 0.4161 | 8.034 | 31.9 | 5.1180 | 4 | 224 | 14.7 | 2.88 | 50.0 |
| 0.31533 | 0 | 6.20 | 0 | 0.5040 | 8.266 | 78.3 | 2.8944 | 8 | 307 | 17.4 | 4.14 | 44.8 |
| 0.52693 | 0 | 6.20 | 0 | 0.5040 | 8.725 | 83.0 | 2.8944 | 8 | 307 | 17.4 | 4.63 | 50.0 |
| 0.38214 | 0 | 6.20 | 0 | 0.5040 | 8.040 | 86.5 | 3.2157 | 8 | 307 | 17.4 | 3.13 | 37.6 |
| 0.57529 | 0 | 6.20 | 0 | 0.5070 | 8.337 | 73.3 | 3.8384 | 8 | 307 | 17.4 | 2.47 | 41.7 |
| 0.33147 | 0 | 6.20 | 0 | 0.5070 | 8.247 | 70.4 | 3.6519 | 8 | 307 | 17.4 | 3.95 | 48.3 |
| 0.36894 | 22 | 5.86 | 0 | 0.4310 | 8.259 | 8.4 | 8.9067 | 7 | 330 | 19.1 | 3.54 | 42.8 |
| 0.61154 | 20 | 3.97 | 0 | 0.6470 | 8.704 | 86.9 | 1.8010 | 5 | 264 | 13.0 | 5.12 | 50.0 |
| 0.52014 | 20 | 3.97 | 0 | 0.6470 | 8.398 | 91.5 | 2.2885 | 5 | 264 | 13.0 | 5.91 | 48.8 |
| 0.57834 | 20 | 3.97 | 0 | 0.5750 | 8.297 | 67.0 | 2.4216 | 5 | 264 | 13.0 | 7.44 | 50.0 |
| 3.47428 | 0 | 18.10 | 1 | 0.7180 | 8.780 | 82.9 | 1.9047 | 24 | 666 | 20.2 | 5.29 | 21.9 |
crim zn indus chas
Min. :0.02009 Min. : 0.00 Min. : 2.680 Min. :0.0000
1st Qu.:0.33147 1st Qu.: 0.00 1st Qu.: 3.970 1st Qu.:0.0000
Median :0.52014 Median : 0.00 Median : 6.200 Median :0.0000
Mean :0.71879 Mean :13.62 Mean : 7.078 Mean :0.1538
3rd Qu.:0.57834 3rd Qu.:20.00 3rd Qu.: 6.200 3rd Qu.:0.0000
Max. :3.47428 Max. :95.00 Max. :19.580 Max. :1.0000
nox rm age dis
Min. :0.4161 Min. :8.034 Min. : 8.40 Min. :1.801
1st Qu.:0.5040 1st Qu.:8.247 1st Qu.:70.40 1st Qu.:2.288
Median :0.5070 Median :8.297 Median :78.30 Median :2.894
Mean :0.5392 Mean :8.349 Mean :71.54 Mean :3.430
3rd Qu.:0.6050 3rd Qu.:8.398 3rd Qu.:86.50 3rd Qu.:3.652
Max. :0.7180 Max. :8.780 Max. :93.90 Max. :8.907
rad tax ptratio lstat medv
Min. : 2.000 Min. :224.0 Min. :13.00 Min. :2.47 Min. :21.9
1st Qu.: 5.000 1st Qu.:264.0 1st Qu.:14.70 1st Qu.:3.32 1st Qu.:41.7
Median : 7.000 Median :307.0 Median :17.40 Median :4.14 Median :48.3
Mean : 7.462 Mean :325.1 Mean :16.36 Mean :4.31 Mean :44.2
3rd Qu.: 8.000 3rd Qu.:307.0 3rd Qu.:17.40 3rd Qu.:5.12 3rd Qu.:50.0
Max. :24.000 Max. :666.0 Max. :20.20 Max. :7.44 Max. :50.0
crim zn indus chas
Min. : 0.00632 Min. : 0.00 Min. : 0.46 Min. :0.00000
1st Qu.: 0.08205 1st Qu.: 0.00 1st Qu.: 5.19 1st Qu.:0.00000
Median : 0.25651 Median : 0.00 Median : 9.69 Median :0.00000
Mean : 3.61352 Mean : 11.36 Mean :11.14 Mean :0.06917
3rd Qu.: 3.67708 3rd Qu.: 12.50 3rd Qu.:18.10 3rd Qu.:0.00000
Max. :88.97620 Max. :100.00 Max. :27.74 Max. :1.00000
nox rm age dis
Min. :0.3850 Min. :3.561 Min. : 2.90 Min. : 1.130
1st Qu.:0.4490 1st Qu.:5.886 1st Qu.: 45.02 1st Qu.: 2.100
Median :0.5380 Median :6.208 Median : 77.50 Median : 3.207
Mean :0.5547 Mean :6.285 Mean : 68.57 Mean : 3.795
3rd Qu.:0.6240 3rd Qu.:6.623 3rd Qu.: 94.08 3rd Qu.: 5.188
Max. :0.8710 Max. :8.780 Max. :100.00 Max. :12.127
rad tax ptratio lstat
Min. : 1.000 Min. :187.0 Min. :12.60 Min. : 1.73
1st Qu.: 4.000 1st Qu.:279.0 1st Qu.:17.40 1st Qu.: 6.95
Median : 5.000 Median :330.0 Median :19.05 Median :11.36
Mean : 9.549 Mean :408.2 Mean :18.46 Mean :12.65
3rd Qu.:24.000 3rd Qu.:666.0 3rd Qu.:20.20 3rd Qu.:16.95
Max. :24.000 Max. :711.0 Max. :22.00 Max. :37.97
medv
Min. : 5.00
1st Qu.:17.02
Median :21.20
Mean :22.53
3rd Qu.:25.00
Max. :50.00